protein-ligand complex
FLOWR.root: A flow matching based foundation model for joint multi-purpose structure-aware 3D ligand generation and affinity prediction
Cremer, Julian, Le, Tuan, Ghahremanpour, Mohammad M., Sługocka, Emilia, Menezes, Filipe, Clevert, Djork-Arné
We present FLOWR:root, an equivariant flow-matching model for pocket-aware 3D ligand generation with joint binding affinity prediction and confidence estimation. The model supports de novo generation, pharmacophore-conditional sampling, fragment elaboration, and multi-endpoint affinity prediction (pIC50, pKi, pKd, pEC50). Training combines large-scale ligand libraries with mixed-fidelity protein-ligand complexes, followed by refinement on curated co-crystal datasets and parameter-efficient finetuning for project-specific adaptation. FLOWR:root achieves state-of-the-art performance in unconditional 3D molecule generation and pocket-conditional ligand design, producing geometrically realistic, low-strain structures. The integrated affinity prediction module demonstrates superior accuracy on the SPINDR test set and outperforms recent models on the Schrodinger FEP+/OpenFE benchmark with substantial speed advantages. As a foundation model, FLOWR:root requires finetuning on project-specific datasets to account for unseen structure-activity landscapes, yielding strong correlation with experimental data. Joint generation and affinity prediction enable inference-time scaling through importance sampling, steering molecular design toward higher-affinity compounds. Case studies validate this: selective CK2$α$ ligand generation against CLK3 shows significant correlation between predicted and quantum-mechanical binding energies, while ER$α$, TYK2 and BACE1 scaffold elaboration demonstrates strong agreement with QM calculations. By integrating structure-aware generation, affinity estimation, and property-guided sampling, FLOWR:root provides a comprehensive foundation for structure-based drug design spanning hit identification through lead optimization.
- North America > United States (0.14)
- Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
- Europe > Poland > Lesser Poland Province > Kraków (0.04)
- (7 more...)
- Research Report > New Finding (1.00)
- Overview (0.92)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.67)
- Information Technology > Data Science (0.67)
A Geometric Graph-Based Deep Learning Model for Drug-Target Affinity Prediction
Rana, Md Masud, Mukta, Farjana Tasnim, Nguyen, Duc D.
In structure-based drug design, accurately estimating the binding affinity between a candidate ligand and its protein receptor is a central challenge. Recent advances in artificial intelligence, particularly deep learning, have demonstrated superior performance over traditional empirical and physics-based methods for this task, enabled by the growing availability of structural and experimental affinity data. In this work, we introduce DeepGGL, a deep convolutional neural network that integrates residual connections and an attention mechanism within a geometric graph learning framework. By leveraging multiscale weighted colored bipartite subgraphs, DeepGGL effectively captures fine-grained atom-level interactions in protein-ligand complexes across multiple scales. We benchmarked DeepGGL against established models on CASF-2013 and CASF-2016, where it achieved state-of-the-art performance with significant improvements across diverse evaluation metrics. To further assess robustness and generalization, we tested the model on the CSAR-NRC-HiQ dataset and the PDBbind v2019 holdout set. DeepGGL consistently maintained high predictive accuracy, highlighting its adaptability and reliability for binding affinity prediction in structure-based drug discovery.
- North America > United States > Tennessee > Knox County > Knoxville (0.14)
- North America > United States > Georgia > Cobb County > Kennesaw (0.04)
Contrastive Multi-Task Learning with Solvent-Aware Augmentation for Drug Discovery
Lan, Jing, Ding, Hexiao, Chen, Hongzhao, Jiang, Yufeng, Ng, Nga-Chun, Cheng, Gerald W. Y., Li, Zongxi, Cai, Jing, Lin, Liang-ting, Yoo, Jung Sun
Accurate prediction of protein-ligand interactions is essential for computer-aided drug discovery. However, existing methods often fail to capture solvent-dependent conformational changes and lack the ability to jointly learn multiple related tasks. To address these limitations, we introduce a pre-training method that incorporates ligand conformational ensembles generated under diverse solvent conditions as augmented input. This design enables the model to learn both structural flexibility and environmental context in a unified manner. The training process integrates molecular reconstruction to capture local geometry, interatomic distance prediction to model spatial relationships, and contrastive learning to build solvent-invariant molecular representations. Together, these components lead to significant improvements, including a 3.7% gain in binding affinity prediction, an 82% success rate on the PoseBusters Astex docking benchmarks, and an area under the curve of 97.1% in virtual screening. The framework supports solvent-aware, multi-task modeling and produces consistent results across benchmarks. A case study further demonstrates sub-angstrom docking accuracy with a root-mean-square deviation of 0.157 angstroms, offering atomic-level insight into binding mechanisms and advancing structure-based drug design.
- Materials > Chemicals > Commodity Chemicals > Petrochemicals (1.00)
- Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
DecoyDB: A Dataset for Graph Contrastive Learning in Protein-Ligand Binding Affinity Prediction
Zhang, Yupu, Xu, Zelin, Xiao, Tingsong, Seabra, Gustavo, Li, Yanjun, Li, Chenglong, Jiang, Zhe
Predicting the binding affinity of protein-ligand complexes plays a vital role in drug discovery. Unfortunately, progress has been hindered by the lack of large-scale and high-quality binding affinity labels. The widely used PDBbind dataset has fewer than 20K labeled complexes. Self-supervised learning, especially graph contrastive learning (GCL), provides a unique opportunity to break the barrier by pre-training graph neural network models based on vast unlabeled complexes and fine-tuning the models on much fewer labeled complexes. However, the problem faces unique challenges, including a lack of a comprehensive unlabeled dataset with well-defined positive/negative complex pairs and the need to design GCL algorithms that incorporate the unique characteristics of such data. To fill the gap, we propose DecoyDB, a large-scale, structure-aware dataset specifically designed for self-supervised GCL on protein-ligand complexes. DecoyDB consists of high-resolution ground truth complexes (less than 2.5 Angstrom) and diverse decoy structures with computationally generated binding poses that range from realistic to suboptimal (negative pairs). Each decoy is annotated with a Root Mean Squared Deviation (RMSD) from the native pose. We further design a customized GCL framework to pre-train graph neural networks based on DecoyDB and fine-tune the models with labels from PDBbind. Extensive experiments confirm that models pre-trained with DecoyDB achieve superior accuracy, label efficiency, and generalizability.
TransDiffSBDD: Causality-Aware Multi-Modal Structure-Based Drug Design
Hu, Xiuyuan, Liu, Guoqing, Chen, Can, Zhao, Yang, Zhang, Hao, Liu, Xue
Structure-based drug design (SBDD) is a critical task in drug discovery, requiring the generation of molecular information across two distinct modalities: discrete molecular graphs and continuous 3D coordinates. However, existing SBDD methods often overlook two key challenges: (1) the multi-modal nature of this task and (2) the causal relationship between these modalities, limiting their plausibility and performance. To address both challenges, we propose TransDiffSBDD, an integrated framework combining autoregressive transformers and diffusion models for SBDD. Specifically, the autoregressive transformer models discrete molecular information, while the diffusion model samples continuous distributions, effectively resolving the first challenge. To address the second challenge, we design a hybrid-modal sequence for protein-ligand complexes that explicitly respects the causality between modalities. Experiments on the CrossDocked2020 benchmark demonstrate that TransDiffSBDD outperforms existing baselines.
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.94)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)
A Generalist Cross-Domain Molecular Learning Framework for Structure-Based Drug Discovery
Zhu, Yiheng, Li, Mingyang, Liu, Junlong, Fu, Kun, Wu, Jiansheng, Li, Qiuyi, Yin, Mingze, Ye, Jieping, Wu, Jian, Wang, Zheng
Structure-based drug discovery (SBDD) is a systematic scientific process that develops new drugs by leveraging the detailed physical structure of the target protein. Recent advancements in pre-trained models for biomolecules have demonstrated remarkable success across various biochemical applications, including drug discovery and protein engineering. However, in most approaches, the pre-trained models primarily focus on the characteristics of either small molecules or proteins, without delving into their binding interactions which are essential cross-domain relationships pivotal to SBDD. To fill this gap, we propose a general-purpose foundation model named BIT (an abbreviation for Biomolecular Interaction Transformer), which is capable of encoding a range of biochemical entities, including small molecules, proteins, and protein-ligand complexes, as well as various data formats, encompassing both 2D and 3D structures. Specifically, we introduce Mixture-of-Domain-Experts (MoDE) to handle the biomolecules from diverse biochemical domains and Mixture-of-Structure-Experts (MoSE) to capture positional dependencies in the molecular structures. The proposed mixture-of-experts approach enables BIT to achieve both deep fusion and domain-specific encoding, effectively capturing fine-grained molecular interactions within protein-ligand complexes. Then, we perform cross-domain pre-training on the shared Transformer backbone via several unified self-supervised denoising tasks. Experimental results on various benchmarks demonstrate that BIT achieves exceptional performance in downstream tasks, including binding affinity prediction, structure-based virtual screening, and molecular property prediction.
- Asia > China > Zhejiang Province > Hangzhou (0.04)
- Asia > China > Jiangsu Province > Nanjing (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- Asia > China > Beijing > Beijing (0.04)
FlowDock: Geometric Flow Matching for Generative Protein-Ligand Docking and Affinity Prediction
Morehead, Alex, Cheng, Jianlin
Powerful generative models of protein-ligand structure have recently been proposed, but few of these methods support both flexible protein-ligand docking and affinity estimation. Of those that do, none can directly model multiple binding ligands concurrently or have been rigorously benchmarked on pharmacologically relevant drug targets, hindering their widespread adoption in drug discovery efforts. In this work, we propose FlowDock, a deep geometric generative model based on conditional flow matching that learns to directly map unbound (apo) structures to their bound (holo) counterparts for an arbitrary number of binding ligands. Furthermore, FlowDock provides predicted structural confidence scores and binding affinity values with each of its generated protein-ligand complex structures, enabling fast virtual screening of new (multi-ligand) drug targets. For the commonly-used PoseBusters Benchmark dataset, FlowDock achieves a 51% blind docking success rate using unbound (apo) protein input structures and without any information derived from multiple sequence alignments, and for the challenging new DockGen-E dataset, FlowDock matches the performance of single-sequence Chai-1 for binding pocket generalization. Additionally, in the ligand category of the 16th community-wide Critical Assessment of Techniques for Structure Prediction (CASP16), FlowDock ranked among the top-5 methods for pharmacological binding affinity estimation across 140 protein-ligand complexes, demonstrating the efficacy of its learned representations in virtual screening. Source code, data, and pre-trained models are available at https://github.com/BioinfoMachineLearning/FlowDock.
- North America > United States > Missouri > Boone County > Columbia (0.04)
- North America > United States > Massachusetts (0.04)
Binding Affinity Prediction: From Conventional to Machine Learning-Based Approaches
Liu, Xuefeng, Jiang, Songhao, Duan, Xiaotian, Vasan, Archit, Liu, Chong, Tien, Chih-chan, Ma, Heng, Brettin, Thomas, Xia, Fangfang, Foster, Ian T., Stevens, Rick L.
Protein-ligand binding [Clyde et al., 2023] refers to the process as shown in Figure 1 by which ligands--usually small molecules, ions, or proteins--generate signals by binding to the active sites of target proteins through intermolecular forces. This binding typically changes the conformation of target proteins, which then results in the realization, modulation, or alteration of protein functions. Therefore, protein-ligand binding plays a central role in most, if not all, important life processes. For example, oxygen molecules are bound and carried through the human body by proteins like hemoglobin, and then utilized for energy production, while nonsteroidal anti-inflammatory drugs (NSAIDs) like ibuprofen work by inhibiting the functionality of the cyclooxygenase (COX) enzyme that thus reducing the release of pain-causing substances in the body. The concept and importance of binding affinity prediction were first addressed in Böhm [1994]: given the 3D structures of a target protein and a potential ligand, the objective is to predict the binding constant of such a complex, along with the most probable binding pose candidates. The prediction of the binding site (the set of protein residues that have at least one non-hydrogen atom within 4.0 Å of a ligand's non-hydrogen atom [Khazanov and Carlson, 2013]) and affinity (binding constants such as inhibition or dissociation constants, or the concentration at 50% inhibition) are usually divided into two separate but related stages [Ballester and Mitchell, 2010a]. One notable motivation for constructing a good binding affinity predictor (or scoring function, as called in some earlier work) is the essential role that it plays in drug discovery [Liu et al., 2023, 2024a] and virtual screening [Meng et al., 2011, Pinzi and Rastelli, 2019, Sadybekov and Katritch, 2023]. Traditional drug discovery essentially involves a process of trial and error.
- North America > United States > Illinois > Cook County > Chicago (0.04)
- Asia > Middle East > Jordan (0.04)
- North America > United States > New York (0.04)
- (5 more...)